Skip to content

ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042 #4299

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 20, 2013

Conversation

Komnomnomnom
Copy link
Contributor

closes #4042

This makes ujson handle very big and very small numbers a bit better, it doesn't help with precision but it should at least be able to handle very small and large exponentials now:

In [4]: from pandas.json import dumps

In [5]: dumps(1e-5)
Out[5]: '0.00001'

In [6]: dumps(1e-6)
Out[6]: '0.000001'

In [7]: dumps(1e-7)
Out[7]: '0.0000001'

In [8]: dumps(1e-8)
Out[8]: '0.00000001'

In [9]: dumps(1e-9)
Out[9]: '0.000000001'

In [10]: dumps(1e-10)
Out[10]: '0.0000000001'

In [11]: dumps(1e-11)
Out[11]: '0.0'

In [12]: dumps(1e-11, double_precision=15)
Out[12]: '0.00000000001'

In [13]: dumps(1e-12, double_precision=15)
Out[13]: '0.000000000001'

In [14]: dumps(1e-13, double_precision=15)
Out[14]: '0.0000000000001'

In [15]: dumps(1e-14, double_precision=15)
Out[15]: '0.00000000000001'

In [16]: dumps(1e-15, double_precision=15)
Out[16]: '0.000000000000001'

In [17]: dumps(1e-16, double_precision=15)
Out[17]: '1e-16'

In [18]: dumps(1e-16)
Out[18]: '1e-16'

In [19]: dumps(1e-17)
Out[19]: '1e-17'

In [20]: dumps(1e-40)
Out[20]: '1e-40'

In [21]: dumps(1e-100)
Out[21]: '1e-100'

In [22]: dumps(1e-400)
Out[22]: '0.0'

In [28]: dumps(1e40)
Out[28]: '1e+40'

In [29]: dumps(1e100)
Out[29]: '1e+100'

In [30]: dumps(1e400)
Out[30]: 'null'

In [31]: from pandas.json import loads

In [32]: loads(dumps(1e100))
Out[32]: 1e+100

In [33]: loads(dumps(1e40))
Out[33]: 1e+40

In [34]: loads(dumps(1e-40))
Out[34]: 1e-40

I have also modified it to throw a ValueError when a bad value is given for double_precision:

In [25]: dumps(1e-400, double_precision=-1)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-25-e15fa4642646> in <module>()
----> 1 dumps(1e-400, double_precision=-1)

ValueError: Invalid value '-1' for option 'double_precision', max is '15'

In [26]: dumps(1e-400, double_precision=16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-ab74b2f14c7f> in <module>()
----> 1 dumps(1e-400, double_precision=16)

ValueError: Invalid value '16' for option 'double_precision', max is '15'

Tested on Python 2.7 on Arch-64. T'would be great if someone could test this out on windows.

@trottier
Copy link

How about for numbers that make full use of double precision? E.g. 1.234567890123456e-40

Also, I trust @njsmith 's opinion over my own on this ... :)

@Komnomnomnom
Copy link
Contributor Author

It seems to work ok

In [1]: from pandas.json import dumps

In [2]: dumps(1.234567890123456e-40)
Out[2]: '1.23456789e-40'

In [3]: dumps(1.234567890123456e-40, double_precision=15)
Out[3]: '1.23456789012346e-40'

In [4]: dumps(1.234567890123456e-40, double_precision=16)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-02b1ef3af2eb> in <module>()
----> 1 dumps(1.234567890123456e-40, double_precision=16)

ValueError: Invalid value '16' for option 'double_precision', max is '15'

And at least it throws an error for an invalid precision setting now rather than silently capping it at 15.

@trottier
Copy link

I'm a little concerned about the rounding up of 5 to 6, here:

In [3]: dumps(1.234567890123456e-40, double_precision=15)
Out[3]: '1.23456789012346e-40' 

@Komnomnomnom
Copy link
Contributor Author

That'll be sprintf rounding things which is standard behaviour for it when given a precision.

I'm investigating the use of PyOS_double_to_string (thanks @njsmith) which looks like it should reproduce what simplejson does.

@Komnomnomnom
Copy link
Contributor Author

Although simplejson has the same rounding behaviour, just to one more decimal place, so I'm inclined to stick with the changes above (i.e. sprintf) now:

In [9]: import json  # simplejson

In [10]: json.dumps(1.234567890123456e-40)
Out[10]: '1.234567890123456e-40'

In [11]: json.dumps(1.234567890123456789e-40)
Out[11]: '1.2345678901234568e-40'

@jreback
Copy link
Contributor

jreback commented Jul 20, 2013

@Komnomnomnom merge?

@Komnomnomnom
Copy link
Contributor Author

I'm happy with it, and I don't have any other commits in the pipeline. There's probably scope for more improvements, but not in this PR.

jreback added a commit that referenced this pull request Jul 20, 2013
ENH: ujson better handling of very large and very small numbers, throw ValueError for bad double_precision arg #4042
@jreback jreback merged commit ec8920a into pandas-dev:master Jul 20, 2013
@jreback
Copy link
Contributor

jreback commented Jul 20, 2013

thank you sir!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants